SIMP: Accurate and Efficient Near Neighbor Search in Very High Dimensional Spaces
نویسندگان
چکیده
Near neighbor search in very high dimensional spaces is useful in many applications. Existing techniques solve this problem efficiently only for the approximate case. These solutions are designed to solve r-near neighbor queries only for a fixed query range or a set of query ranges with probabilistic guarantees and then, extended for nearest neighbor queries. Solutions supporting a set of query ranges suffer from prohibitive space cost. There are many applications which are quality sensitive and need to efficiently and accurately support near neighbor queries for all query ranges. In this paper, we propose a novel indexing and querying scheme called Spatial Intersection and Metric Pruning (SIMP) that efficiently supports r-near neighbor queries in very high dimensional spaces for all query ranges with 100% quality guarantee and with practical storage costs. We also provide a statistical cost model for SIMP. SIMP outperforms LSH, Multi-Probe LSH, and LSB tree on two real datasets of dimensions 128 and 256 and sizes 1 million and 1.08 million respectively. We show on a 128 dimensional real dataset of 10 million points that SIMP scales linearly with increasing query range.
منابع مشابه
Fast Near Neighbor Search in High-Dimensional Binary Data
Numerous applications in search, databases, machine learning, and computer vision, can benefit from efficient algorithms for near neighbor search. This paper proposes a simple framework for fast near neighbor search in high-dimensional binary data, which are common in practice (e.g., text). We develop a very simple and effective strategy for sub-linear time near neighbor search, by creating has...
متن کاملRIVA: Indexing and Visualization of High-Dimensional Data Via Dimension Reorderings
We propose a new representation for high-dimensional data that can prove very effective for visualization, nearest neighbor (NN) and range searches. It has been unequivocally demonstrated that existing index structures cannot facilitate efficient search in high-dimensional spaces. We show that a transformation from points to sequences can potentially diminish the negative effects of the dimensi...
متن کاملOn Optimizing Nearest Neighbor Queries in High-Dimensional Spaces
Nearest-neighbor queries in high-dimensional space are of high importance in various applications, especially in content-based indexing of multimedia data. For an optimization of the query processing, accurate models for estimating the query processing costs are needed. In this paper, we propose a new cost model for nearest neighbor queries in high-dimensional space, which we apply to enhance t...
متن کاملOn Optimizing Nearest Neighbor Queries in High-Dimensional Data Spaces
Nearest-neighbor queries in high-dimensional space are of high importance in various applications, especially in content-based indexing of multimedia data. For an optimization of the query processing, accurate models for estimating the query processing costs are needed. In this paper, we propose a new cost model for nearest neighbor queries in high-dimensional space, which we apply to enhance t...
متن کاملFast Nearest Neighbor Search in High-Dimensional Space
Similarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of high-dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore precompute the result of any nearest-neighbor se...
متن کامل